Using the "Students Performance in Exams" dataset, you can answer a variety of questions that explore the relationships between student demographics, socio-economic factors, and academic performance. Here are some key questions that can be addressed with this dataset:
Gender and Performance: How do average scores in math, reading, and writing differ between male and female students? Are there significant differences in performance between genders across different subjects?
Parental Education and Performance: Is there a correlation between the parental level of education and student performance in math, reading, and writing? Do students whose parents have higher education levels perform better on average?
Test Preparation Course: How does completing a test preparation course impact student scores in math, reading, and writing? What percentage of students who completed the test preparation course scored above a certain threshold?
Lunch Type: Does the type of lunch (standard vs. free/reduced) affect student performance? How do average scores compare between students who receive standard lunch and those who receive free/reduced lunch?
Race/Ethnicity and Performance: How do average scores vary across different race/ethnicity groups? Are there notable performance gaps between different racial/ethnic groups?
Subject-wise Performance: How do students' performances in math, reading, and writing compare? Are there students who excel in one subject but perform poorly in another?
Score Correlation: Is there a correlation between math scores and reading scores? Between reading scores and writing scores? What are the strongest predictors of student performance in each subject?
Score Distribution: What is the distribution of scores in math, reading, and writing? Are there any outliers or unusual patterns in the score distributions?
Let’s start with Loading the Student Performance Data.
import pandas as pd
import altair as alt
# Load the dataset
data = pd.read_csv("exams.csv")
data.head()
In this section, we will explore the relationship between math and reading scores, segmented by gender. The scatter plot will allow us to visualize how male and female students perform in both subjects, helping to identify any trends or disparities in their scores.
The analysis aims to answer the following questions:
Let’s proceed with creating the scatter plot.
# Implementing selection
selection = alt.selection(type='multi', fields=['gender'])
alt.Chart(data).mark_circle().encode(
x='math score:Q',
y='reading score:Q',
color=alt.Color('gender:N', scale=alt.Scale(scheme='category10')),
tooltip=['gender', 'math score', 'reading score'],
opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
).add_selection(selection).properties(
title='Scatter Plot of Math vs. Reading Scores by Gender'
)
In the interactive scatter plot above, we can observe that boys tend to perform better in math, while girls excel in reading. You can click on the blue or orange points to highlight specific groups of students, allowing for a more focused analysis of their performance.
Next, let's proceed to create the interactive scatter plot that will further enhance our understanding of these trends.
# Implement Exploration (Pan and Zoom) with a title
chart = alt.Chart(data).mark_circle().encode(
x='math score:Q',
y='reading score:Q',
color=alt.Color('gender:N', scale=alt.Scale(scheme='category10')),
tooltip=['gender', 'math score', 'reading score']
).interactive().properties(
title='Interactive Scatter Plot of Math vs. Reading Scores by Gender'
)
chart
In the interactive scatter plot above, you can zoom in to adjust the scale as needed. This feature allows for a more detailed examination of the relationship between math and reading scores based on gender.
#Implement Abstract/Elaborate:
# Abstract/Elaborate with semantic zoom
selection = alt.selection(type='multi', fields=['gender'])
# Overview chart
overview = alt.Chart(data).mark_bar().encode(
y='count()',
x='gender:N',
color=alt.condition(selection, alt.value("orange"), alt.value("lightgrey"))
).add_selection(selection)
# Detail chart
detail = alt.Chart(data).mark_circle().encode(
y='reading score:Q',
x='math score:Q',
color=alt.Color('gender:N', scale=alt.Scale(scheme='category10')),
tooltip=['gender', 'math score', 'reading score']
).transform_filter(selection).properties(
title='Gender-Based Overview of Student Performance'
)
overview | detail
In this section, you can interactively select gender in the scatter plot. Click on the count of records on the left side to highlight the data points you wish to focus on. You can choose to view one gender or both by clicking on the respective counts.
#Implement Filtering:
# Bind selection to legend
selection = alt.selection(type='multi', fields=['gender'], bind='legend')
alt.Chart(data).mark_circle().encode(
x='math score:Q',
y='reading score:Q',
color=alt.Color('gender:N', scale=alt.Scale(scheme='category10')),
tooltip=['gender', 'math score', 'reading score'],
opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
).add_selection(selection).properties(
title='Interactive Filtering: Math vs. Reading Scores by Gender'
)
Click on the legend labeled "Gender" to select the information you wish to highlight in the scatter plot. This allows you to focus on specific data points based on gender.
#implement encoding
import altair as alt
# Assuming 'data' is your DataFrame loaded with the student performance data
dropdown = alt.binding_select(options=['math score', 'reading score', 'writing score'], name='Select a score:')
selection = alt.selection_single(fields=['Score'], bind=dropdown, init={'Score': 'math score'})
alt.Chart(data).transform_fold(
['math score', 'reading score', 'writing score'],
as_=['Score', 'Value']
).transform_filter(
selection
).mark_circle().encode(
x='Value:Q',
y='Score:N',
color=alt.Color('gender:N', scale=alt.Scale(scheme='category10')),
tooltip=['gender:N', 'Value:Q', 'Score:N']
).add_selection(selection).properties(
width=600,
height=400,
title='Dynamic Score Selection: Comparing Math, Reading, and Writing Scores by Gender'
)
# Transform the data to long format
# Create the dropdown selection for lunch
dropdown = alt.binding_select(options=data['lunch'].unique(), name='Select Lunch Type:')
selection = alt.selection_single(fields=['lunch'], bind=dropdown, init={'lunch': data['lunch'].unique()[0]})
# Transform the data to long format for scores
data_long = data.melt(id_vars=['gender', 'lunch'], value_vars=['math score', 'reading score', 'writing score'],
var_name='Score', value_name='Value')
# Create the chart with data transformation
chart = alt.Chart(data_long).transform_filter(
selection
).mark_circle().encode(
x='Value:Q',
y='Score:N',
color=alt.Color('gender:N', scale=alt.Scale(scheme='category10')),
tooltip=['gender', 'Value', 'Score']
).add_selection(selection).properties(
title="Impact of Lunch Type on Student Performance Across Subjects"
)
chart
# Load the dataset
data = pd.read_csv("exams.csv")
# Melt the data to long format for easier plotting
data_long = data.melt(id_vars=['gender', 'race/ethnicity', 'parental level of education', 'lunch', 'test preparation course'],
value_vars=['math score', 'reading score', 'writing score'],
var_name='subject', value_name='score')
# Create an animated scatter plot
# Use a selection with radio buttons to simulate animation
input_dropdown = alt.binding_radio(
options=['math score', 'reading score', 'writing score'],
name='Select Subject:',
)
selection = alt.selection_single(
fields=['subject'],
bind=input_dropdown,
name="subject_selection",
init={'subject': 'math score'}
)
# Create the chart
chart = alt.Chart(data_long).mark_circle().encode(
x='score:Q',
y=alt.Y('subject:N', title=''),
color='gender:N',
tooltip=['gender', 'score', 'subject']
).transform_filter(
selection
).add_selection(
selection
).properties(
width=600,
height=400,
title='Student Performance Across Subjects'
)
chart
import altair as alt
import pandas as pd
# Load the dataset
data = pd.read_csv("exams.csv")
# Melt the data to long format for easier plotting
data_long = data.melt(id_vars=['gender', 'race/ethnicity', 'parental level of education', 'lunch', 'test preparation course'],
value_vars=['math score', 'reading score', 'writing score'],
var_name='subject', value_name='score')
# Create a dropdown selection for subjects
dropdown = alt.binding_select(options=['math score', 'reading score', 'writing score'], name='Select Subject:')
selection = alt.selection_single(fields=['subject'], bind=dropdown, init={'subject': 'math score'})
# Create the heatmap
heatmap = alt.Chart(data_long).transform_filter(
selection
).mark_rect().encode(
x='race/ethnicity:N',
y='parental level of education:N',
color='mean(score):Q',
tooltip=['mean(score):Q']
).properties(
width=600,
height=400,
title="Mean Student Scores by Race/Ethnicity and Parental Education Level"
).add_selection(
selection
)
heatmap
import pandas as pd
import altair as alt
# Load the dataset
data = pd.read_csv("exams.csv")
# Melt the data to long format for easier plotting
data_long = data.melt(id_vars=['gender', 'race/ethnicity', 'parental level of education', 'lunch', 'test preparation course'],
value_vars=['math score', 'reading score', 'writing score'],
var_name='subject', value_name='score')
# Create a dropdown selection for the grouping variable
grouping_options = ['gender', 'race/ethnicity', 'parental level of education', 'lunch', 'test preparation course']
dropdown = alt.binding_select(options=grouping_options, name='Group by:')
selection = alt.selection_single(fields=['group'], bind=dropdown, init={'group': 'gender'}, name='selector')
# Create a base chart
base = alt.Chart(data_long).transform_calculate(
group='datum[selector.group]' # This will dynamically use the selected group
).mark_rect().encode(
x='subject:N',
y=alt.Y('group:O', title='Group', axis=alt.Axis(labelLimit=300)), # Increase the label limit
color='mean(score):Q',
tooltip=['mean(score):Q']
).properties(
width=600,
height=400,
title="Mean Student Scores by Grouping Variable",
padding={"left": 150, "right": 50, "top": 10, "bottom": 30} # Add padding to the left
).add_selection(
selection
)
# Show the chart
base
# Save to HTML
base.save('heatmap_interactive.html')
Please refer to the interactive heatmap visualization of the student performance data by clicking here. The chart allows you to select different grouping variables to compare mean scores across different subjects.
Simply click the link to interact with the visualization.
Based on the analysis of the "Students Performance in Exams" dataset, several key insights have been identified regarding the relationships between student demographics, socio-economic factors, and academic performance. These insights can help educators and policymakers understand and address the factors influencing student outcomes.
Reading Scores: Female students consistently outperform male students in reading. This trend is evident across all levels of analysis, indicating that girls have a stronger proficiency in reading.
Math Scores: Conversely, male students tend to perform better in math compared to female students. This suggests a gender disparity in performance that may warrant further investigation to understand underlying causes and address potential educational gaps.
Group E: Among the various race/ethnicity groups, students belonging to Group E show the highest average scores in both math and reading. This group's performance stands out, highlighting potential best practices or factors that could be emulated to improve performance in other groups.
Higher Education Levels: Students whose parents have attained higher education levels, particularly those with Bachelor's or Master's degrees, tend to achieve better scores in all subjects (math, reading, and writing). This correlation underscores the importance of parental education in influencing student academic success and suggests that educational interventions may benefit from involving and educating parents.
Standard Lunch: Students who receive standard lunch tend to have higher scores compared to those who receive free or reduced lunch. However, it is important to note that this comparison might not fully capture the impact of lunch type, as we lack data on the students' performance before and after receiving free or reduced lunch. Further longitudinal studies are needed to determine the causal effects of lunch programs on academic performance.
Positive Impact of Test Prep: Completing a test preparation course is associated with better scores in math, reading, and writing. This finding highlights the value of targeted test preparation in enhancing student performance and suggests that expanding access to such resources could be beneficial.
Targeted Interventions for Gender Disparities: Develop and implement programs aimed at supporting male students in reading and female students in math to address gender disparities.
Best Practices from Group E: Investigate and replicate the strategies or conditions that contribute to the success of Group E in other racial/ethnic groups.
Parental Involvement: Encourage and facilitate parental involvement in education, especially for parents with lower educational attainment, to boost student performance.
Comprehensive Lunch Program Evaluation: Conduct longitudinal studies to evaluate the impact of free and reduced lunch programs on academic performance, ensuring that comparisons account for changes over time.
Expand Test Preparation Resources: Increase accessibility to test preparation courses, particularly for students from socio-economically disadvantaged backgrounds, to help level the playing field.
These conclusions and recommendations provide a foundation for informed decision-making and strategic planning aimed at improving educational outcomes for all students.
", "text/plain": "alt.Chart(...)"}, "metadata": {}, "output_type": "display_data"}]}}, "6cf8bda6312049dc884a68ce08302e40": {"model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {}}, "75aae03d908b4e5594f8eb0a8aa660dd": {"model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": {"layout": "IPY_MODEL_3f524e7fb43849f19e12e29e964b44d6", "outputs": [{"data": {"text/html": "\n
\n", "text/plain": "alt.Chart(...)"}, "metadata": {}, "output_type": "display_data"}]}}, "7d5e09be88a7421f998eca8cb802f0dc": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DropdownModel", "state": {"_options_labels": ["gender", "race/ethnicity", "parental level of education", "lunch", "test preparation course"], "description": "Group by:", "index": 1, "layout": "IPY_MODEL_e6966f9bad004bc795b163674a4413b3", "style": "IPY_MODEL_61f471d4718b4e36ab6fc324dc9428cb"}}, "7e1636bfe3284be9b504da3f84a231b4": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "VBoxModel", "state": {"_dom_classes": ["widget-interact"], "children": ["IPY_MODEL_7d5e09be88a7421f998eca8cb802f0dc", "IPY_MODEL_66fdf44803a34b76b31cfd7204bf3af7"], "layout": "IPY_MODEL_44327374eeec44f490740d7ef0ab73a1"}}, "9cd4f101c09f4d2d8a3ff7d9345b2a86": {"model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {}}, "a8daebb6ba834990a383849eec37e6f1": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DropdownModel", "state": {"_options_labels": ["gender", "race/ethnicity", "parental level of education", "lunch", "test preparation course"], "description": "Group by:", "index": 0, "layout": "IPY_MODEL_9cd4f101c09f4d2d8a3ff7d9345b2a86", "style": "IPY_MODEL_3d2b27f9f8cb436ab19ae5d8be6bf424"}}, "b9dd435d871549998af5ea1ba28ad252": {"model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {}}, "e6966f9bad004bc795b163674a4413b3": {"model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {}}}, "version_major": 2, "version_minor": 0}